AI and Science

BOINC from UC-Berkeley is a project that lets you allow scientific computing tasks to run on your computer in the background. It’s similar to projects like SETI@Home, but more general; it isn’t associated with a specific research project.

IBM and NASA built a science-oriented LLM trained on “60 billion tokens on a corpus of astrophysics, planetary science, earth science, heliophysics, and biological and physical sciences data. Unlike a generic tokenizer, the one we developed is capable of recognizing scientific terms such as”axes” and “polycrystalline.” More than half of the 50,000 tokens our models processed were unique compared to the open-source RoBERTa model on Hugging Face.”

both models are available on Hugging Face: the encoder model can be further finetuned for applications in the space domain, while the retriever model can be used for information retrieval applications for RAG.

Microsoft lists AI in Science as one of its big trends for 2024 including  tools for sustainable agriculture., the world’s largest image-based AI model to fight cancer and using advanced AI to find new drugs for infectious diseasesand new molecules for breakthrough medicines.

Implications

AI and the transformation of social science research

Grossmann et al. (2023)

A well-summarized take. See also the extensive references, including an apparently exhaustive list of examples of AI in social sciences research.

LLMs can allow the kinds of simulations that until now were limited to domains that allowed numerical quantification – e.g. particle physics, epidemiology, economics. But with its ability to respond to surveys might make it a way to simulate research into human behavior.

Important caveat: “Already, LLM engineers have been fine-tuning pretrained models for the world that “should be” (12) rather than the world that is, and such efforts to mitigate biases in AI training (2, 13) may thus undermine the validity of AIassisted social science research.”

See ABM: Agent-based models are composed of: (1) numerous agents specified at various scales; (2) decision-making heuristics; (3) learning rules or adaptive processes; (4) an interaction topology; and (5) an environment.

via Penn Today


AI in Biotech

Software

OpenAlex is a search engine like Google Scholar, but with a free API.

We index over 250M scholarly works from 250k sources, with extra coverage of humanities, non-English languages, and the Global South.

We link these works to 90M disambiguated authors and 100k institutions, as well as enriching them with topic information, SDGs, citation counts, and much more

GPT Researcher is an autonomous agent designed for comprehensive online research on a variety of tasks. (more details)

Elicit.org search scientific papers with AI

Paper-QA is a Github with examples for how to do scientific research reviews > minimal package for doing question and answering from PDFs or text files (which can be raw HTML). It strives to give very good answers, with no hallucinations, by grounding responses with in-text citations.

Lateral

https://www.lateral.io/

Complete hours of reading in minutes. A web app that helps you read, find, share & organise your research in one place. So you can complete it up to 10x faster.

Zeta Alpha

https://www.zeta-alpha.com/ an Amsterdam-based company that claims “the best Neural Discovery Platform for AI and beyond. Use state-of-the-art Neural Search to improve how you and your team discover, organize and share knowledge.”

see Sara Hamburg tweet.

https://www.scholarcy.com/

Scispace

I uploaded directly from my Zotero Library.

Scispace front page

But when I asked it a basic question “What are the nutritional differences between frozen and fresh blueberries”

Worse, when I asked it:

“how does freezing impact the nutritional profile of blueberries”

The top answers related to electric vehicles, and “freezing” a pension plan.

Consensus

Consensus is free for unlimited searches and AI-powered filters, plus 20 credits per month for the more advanced features like GPT-4 summaries and the Consensus Meter. $9 / month for everything, about half that for students.

Try “does consuming aspartame increase the risk of cancer?”

Consensus AI-Powered Search Engine

Apply filters, like this one that limits to studies that are validated with RCT: ### Scite_

1.2b citation statements extracted and analyzed 181m articles, book chapters, preprints, and datasets

Smart Citations allow users to see how a publication has been cited by providing the context of the citation and a classification describing whether it provides supporting or contrasting evidence for the cited claim Smart Citations

Interview with the developer of _Site ## Discovery for science and beyond

Google’s FunSearch: Making new discoveries in mathematical sciences using Large Language Model

Romera-Paredes et al. (2024)

FunSearch works by pairing a pre-trained LLM, whose goal is to provide creative solutions in the form of computer code, with an automated “evaluator”, which guards against hallucinations and incorrect ideas. By iterating back-and-forth between these two components, initial solutions “evolve” into new knowledge. The system searches for “functions” written in computer code; hence the name FunSearch.

References

Grossmann, Igor, Matthew Feinberg, Dawn C. Parker, Nicholas A. Christakis, Philip E. Tetlock, and William A. Cunningham. 2023. AI and the Transformation of Social Science Research.” Science 380 (6650): 1108–9. https://doi.org/10.1126/science.adi1778.
Romera-Paredes, Bernardino, Mohammadamin Barekatain, Alexander Novikov, Matej Balog, M. Pawan Kumar, Emilien Dupont, Francisco J. R. Ruiz, et al. 2024. “Mathematical Discoveries from Program Search with Large Language Models.” Nature 625 (7995): 468–75. https://doi.org/10.1038/s41586-023-06924-6.